Clinical coding of long COVID in primary care 2020–2023 in a cohort of 19 million adults: an OpenSAFELY analysis

Summary Background Long COVID is the patient-coined term for the persistent symptoms of COVID-19 illness for weeks, months or years following the acute infection. There is a large burden of long COVID globally from self-reported data, but the epidemiology, causes and treatments remain poorly understood. Primary care is used to help identify and treat patients with long COVID and therefore Electronic Health Records (EHRs) of past COVID-19 patients could be used to help fill these knowledge gaps. We aimed to describe the incidence and differences in demographic and clinical characteristics in recorded long COVID in primary care records in England. Methods With the approval of NHS England we used routine clinical data from over 19 million adults in England linked to SARS-COV-2 test result, hospitalisation and vaccination data to describe trends in the recording of 16 clinical codes related to long COVID between November 2020 and January 2023. Using OpenSAFELY, we calculated rates per 100,000 person-years and plotted how these changed over time. We compared crude and adjusted (for age, sex, 9 NHS regions of England, and the dominant variant circulating) rates of recorded long COVID in patient records between different key demographic and vaccination characteristics using negative binomial models. Findings We identified a total of 55,465 people recorded to have long COVID over the study period, which included 20,025 diagnoses codes and 35,440 codes for further assessment. The incidence of new long COVID records increased steadily over 2021, and declined over 2022. The overall rate per 100,000 person-years was 177.5 cases in women (95% CI: 175.5–179) and 100.5 in men (99.5–102). The majority of those with a long COVID record did not have a recorded positive SARS-COV-2 test 12 or more weeks before the long COVID record. Interpretation In this descriptive study, EHR recorded long COVID was very low between 2020 and 2023, and incident records of long COVID declined over 2022. Using EHR diagnostic or referral codes unfortunately has major limitations in identifying and ascertaining true cases and timing of long COVID. Funding This research was supported by the 10.13039/501100000272National Institute for Health and Care Research (NIHR) (OpenPROMPT: COV-LT2-0073).


Information governance and ethical approval
NHS England is the data controller of the NHS England OpenSAFELY COVID-19 Service.TPP is the data processor; all study authors using OpenSAFELY have the approval of NHS England.This implementation of OpenSAFELY is hosted within the TPP environment which is accredited to the ISO 27001 information security standard and is NHS IG Toolkit compliant.Patient data has been pseudonymised for analysis and linkage using industry standard cryptographic hashing techniques; all pseudonymised datasets transmitted for linkage onto OpenSAFELY are encrypted; access to the NHS England OpenSAFELY COVID-19 service is via a virtual private network (VPN) connection; the researchers hold contracts with NHS England and only access the platform to initiate database queries and statistical models; all database activity is logged; only aggregate statistical outputs leave the platform environment following best practice for anonymisation of results such as statistical disclosure control for low cell counts.
The service adheres to the obligations of the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018.The service previously operated under notices initially issued in February 2020 by the the Secretary of State under Regulation 3(4) of the Health Service (Control of Patient Information) Regulations 2002 (COPI Regulations), which required organisations to process confidential patient information for COVID-19 purposes; this set aside the requirement for patient consent.As of 1 July 2023, the Secretary of State has requested that NHS England continue to operate the Service under the COVID-19 Directions 2020.In some cases of data sharing, the common law duty of confidence is met using, for example, patient consent or support from the Health Research Authority Confidentiality Advisory Group.
Taken together, these provide the legal bases to link patient datasets using the service.GP practices, which provide access to the primary care data, are required to share relevant health information to support the public health response to the pandemic, and have been informed of how the service operates.This research is part of the OpenPROMPT study "Quality-of-life in patients with long COVID: harnessing the scale of big data to quantify the health and economic costs" which has ethical approval from HRA and Health and Care Research Wales (HCRW) (IRAS project ID 304354).The Study Coordination Centre has obtained approval from the LSHTM Research Ethics Committee (ref 28030), as well as a favourable opinion from the South Central-Berkshire B Research Ethics Committee (ref 22/SC/0198).

Hierarchical long COVID definition
We separated SNOMED-CT codes that indicated long COVID in a patient health record into those that were clinical diagnoses, and other referral/assessment codes.We established a hierarchical search for the first record of long COVID because we determined that a diagnosis code gave stronger evidence of the presence of long COVID, and therefore a more accurate timing of the record as well.We therefore initially searched a patient's record for diagnosis code first, and if none existed then searched for a referral/assessment code.If neither code type existed then that participant was classified as not having long COVID.

Secondary cohort
In a separate cohort, we investigated the impact of having EHR-recorded long COVID on further vaccination rates.We therefore developed a secondary cohort that includes only those with a record of long COVID and follows up until January 2023 or loss to follow-up, and we summarised vaccine coverage in this cohort as of January 2023.
The secondary cohort included all patients with a record of Long COVID before they had a vaccination.Individuals entered this cohort at the time of a diagnosis (or referral) for long COVID.Individuals were excluded if they have already received a vaccination at this point.We included all individuals who met this criteria from 1st November 2020 to 22nd October 2021, when the first booster vaccination was delivered (1).We followed up this cohort until January 2023 and measured the number of vaccine doses these individuals received over the study period.We calculated the rate of vaccination in this cohort and compared this to the overall rate of vaccination in the primary cohort.

Long COVID after hospitalisation
We also defined Long COVID outcomes dependent on previous SARS-COV-2 history.All records of Long COVID were further divided into the following groups: 1 Level of multimorbidity Categorised as 0, 1, 2+.
Comorbidities are assessed at study entry.Relevant comorbidities will be defined based on previous research of risk factors for Long COVID in OpenSAFELY (6).A previous code 6 months to 5 years before March 2020 for one or more of: diabetes; cancer; haematological cancer; asthma; chronic respiratory disease; chronic cardiac disease; chronic liver disease; stroke or dementia; other neurological condition; organ transplant; dysplasia; rheumatoid arthritis, systemic lupus erythematosus or psoriasis; or other immunosuppressive conditions.Those with no relevant code for a condition will be assumed not to have that condition.Number of conditions were categorised into "0", "1", and "2 or more"..

Statistical analysis
The crude rate was expressed per 100,000 person-years where is the number of   ( )  events and the total follow-up time of observation, with 95% confidence intervals from  to where (7)./  ×   =  1. 96/  ( ) We visualised the temporal dynamics of long COVID captured in EHRs on a weekly scale and further summarised the number of recorded cases on a daily scale.In plots, we stratified the weekly totals by diagnosis or referral codes, the three most prevalent long COVID SNOMED codes in the data, vaccination status and sex.We compared the dynamics of long COVID to the total recorded cases from the UK Coronavirus dashboard (8).

Recording long COVID in vaccinated groups
We compared the rate of recorded long COVID between people with 0, 1, 2, 3+ SARS-COV-2 vaccinations.Due to the uncertainty in the accuracy and timing of long COVID onset in our descriptive analysis, these results are not generalisable and do not reflect any meaningful causal relationship between SARS-COV-2 vaccinations and the probability of new onset long COVID.
Recorded long COVID is an imprecise measure of the actual incidence of the condition.These results are therefore presented as simple comparisons of an imperfect outcome measure and no causal or aetiological conclusions can be drawn from them, however we believe they still have value to inform future research and they have been presented for completeness.
In the primary cohort, followed up until they received a long COVID code, the crude rate of recorded long COVID was lowest for people with 3 or more vaccine doses (103.5 per 100,000 person-years; 95% CI: 101.5-105) (Figure S7, Table S1).We estimated the rates in different vaccine dose groups adjusted for age, sex, region and variant, but this is not equivalent to a vaccine effect estimate.The rate of recorded long COVID was 0.85 (95% CI: 0.73-0.99)times lower at least 14 weeks after one vaccine dose, 0.58 (95% CI: 0.5-0.68)times lower after 2 doses, and 0.15 (95% CI: 0.12-0.18)times lower after 3 or more doses, with similar patterns for long COVID diagnosis codes only (Figure S7, Table S2).The rate of recording of long COVID was lower in people who received an mRNA-based vaccine for their first dose (RR 0.41; 95% CI: 0.3-0.47)compared to unvaccinated.Those who received adenovirus-based (or other non mRNA formulated vaccines) as a first dose still had lower rates of long COVID in the analysis but with a rate ratio closer to null (0.87; 95% CI: 0.77-0.99).To quality assure the analysis we repeated the models with COVID-19 hospitalisation as an outcome and found associations with age, sex, and vaccination that are consistent with previous research (Figure S7).
There is a lag between vaccination and protection from infection, and a further lag until a diagnosis or referral for long COVID can be made.In our main analysis we assumed that this gap is 14 weeks in total.In a sensitivity analysis we expanded this time gap to 18 or 26 weeks.The findings from these results were consistent with our main findings of reduced rates of long COVID with increasing vaccine doses, and a lower rate ratio for those receiving mRNA than non-mRNA vaccines as a first dose (Figure S8).We also repeated the analysis stratified by three broad categorisations of the dominant circulating variant of SARS-COV-2 (wild/alpha, delta, omicron).We found that the rate ratios for the effect of vaccination were lowest for long COVID recording during the wild/alpha and delta period and higher during omicron, but were consistently lowest in those with 3+ vaccine doses (Figure S9).

Secondary cohort analysis
In a secondary cohort we continued to follow people after their long COVID record.In this exploratory analysis across we found that the percentage of unvaccinated people was greater in those with a previous long COVID record compared to those without, and this difference was largest in the youngest (18-29) and oldest (70+) age groups (Figure S10).

Discussion
Statistical methods (a) Describe the primary statistical methods used to estimate the measure of disease occurrence being targeted; discuss assumptions of that method in light of data limitations (e.g., assumption of independent censoring for people lost to follow-up).(b) If any adjustment/standardization will be done, state the goal of such adjustment.

Participants
Report numbers of individuals at each study stage (this is likely to be approximate for the target population); consider summarizing this information in a flow diagram.

Variation in incidence of long COVID recording in
England & Table 2 Descriptive data (a) Report on the characteristics of the analytical sample in a "Table 1." (b) Indicate the number of participants with missing data for each variable used in the analysis.(c) If any weighting or imputation is done to reconstruct the study sample or target populations, include columns for those populations.
Table 2 Outcome data (a) Present an overall (unstratified) estimate of the measure of occurrence of interest.(b) Report "crude" (raw data in the analytical sample) and (if applicable) "corrected" (after any weighting or imputation) estimates.
Recorded long COVID rates vary between population groups & Figure 3 Other analyses Present prespecified stratum-specific or adjusted/standardized results.

Figure S5 :
Figure S5: Monthly count of new long COVID records over time for any long COVID code (red), or long COVID diagnosis codes only (blue) (left-hand axis).These trends are plotted alongside the 7-day rolling average of positive COVID-19 tests available from the UK Health Security Agency (right-hand axis).To define long COVID, records were searched for a diagnosis code first (Any long COVID diagnosis code), if no code existed then we searched for a referral code (Any long COVID code).If neither code existed then the individual was classified as not having long COVID.

Figure S6 :
Figure S6: Crude rate ratios for records of long COVID and COVID-19 hospitalisation.Rate ratios are estimated from negative binomial regression models with a single covariate in each model (age category, sex, number of vaccine doses or whether the first does was mRNA).To define long COVID, records were searched for a diagnosis code first (Any long COVID diagnosis code), if no code existed then we searched for a referral code (Any long COVID code).If neither code existed then the individual was classified as not having long COVID.

Figure S7 :
Figure S7: Adjusted rate ratios for records of long COVID and COVID-19 hospitalisation.Rate ratios are estimated from negative binomial regression models adjusted for age, sex, 9 NHS regions of England, and the dominant variant circulating.To define long COVID, records were searched for a diagnosis code first (Any long COVID diagnosis code), if no code existed then we searched for a referral code (Any long COVID code).If neither code existed then the individual was classified as not having long COVID.

Figure S8 :
Figure S8: Sensitivity analysis of negative binomial models for vaccine covariates.Results show the rate ratios for records of long COVID and COVID-19 hospitalisation.Rate ratios are estimated from negative binomial regression models adjusted for age, sex, 9 NHS regions of England, and the dominant variant circulating.The models are run under three different data management scenarios.The first ("12 weeks") is the primary analysis, the others show results when the gap between vaccine date and long COVID/end of follow up is extended to 16 and 24 weeks.To define long COVID, records were searched for a diagnosis code first (Any long COVID diagnosis code), if no code existed then we searched for a referral code (Any long COVID code).If neither code existed then the individual was classified as not having long COVID.

Figure S9 :
Figure S9: Adjusted rate ratios for records of long COVID and COVID-19 hospitalisation stratified by the dominant variant circulating.Rate ratios are estimated from negative binomial regression models adjusted for age, sex, 9 NHS regions of England over three different time periods (wildtype/alpha, 1 November 2020 -16 May 2021; Delta, 16 May 2021 -1 December 2021; Omicron, 1 December 2021 -31 Jan 2023).To define long COVID, records were searched for a diagnosis code first (Any long COVID diagnosis code), if no code existed then we searched for a referral code (Any long COVID code).If neither code existed then the individual was classified as not having long COVID.

Figure S10 :
Figure S10: Coverage of vaccination after a record of long COVID (blue) compared to those that did not record long COVID (red).Bars show the percentage of people with 0, 1, 2, or 3+ vaccine doses as of January 2023

Figure 3
Figure 3 and Figure S7

Table S1 :
Crude rates of incident long COVID codes in primary care between November 2020 and January 2023, stratified by demographic and clinical characteristics.All counts are rounded to the nearest 5 and counts less than 10 are redacted.

Table S3 :
Demographic and clinical characteristics at baseline of 55,465 individuals with a record of long COVID during the study period, stratified by whether they had a record of a positive SARS-COV-2 test at least 12 weeks before the long COVID code.Evidence for a difference between those with/without a positive test are shown with p-values from a Chi-squared test for each "Variable".All counts <10 have been redacted and rounded to the nearest 5 or methods used to extrapolate data from the analytical sample to the study population and from the study population to the target population.
(c) If the study is longitudinal, specify the time origin and follow-up period for the measure of occurrence; if the study is cross-sectional, specify the time anchor at which the health state is summarized for individuals.